Modus Questions: Query Models and Frequency in Russian Text Corpora
نویسندگان
چکیده
The paper deals with the analysis of modus questions used in dialogues of native Russian speakers, discusses their quantitative properties and characteristics. The research focuses on the development of models describing these questions based on the Russian National Corpus and a newspaper corpus. The results obtained can be applied in various fields of natural language processing, e.g. dialogue systems.
منابع مشابه
Syntactic Complexity of Russian Unified State Exam Texts in English: A Study on Reliability and Validity
In this study we analyze texts used in Russian Unified State Exam on English language. Texts that formed small research corpora were retrieved from 2 resources: official USE database as a reference point, and popular website used by pupils for USE training “Neznaika” (https://neznaika.pro/). The size of two corpora is balanced: USE has 11934 tokens and “Neznaika” - 11918 tokens. We share Biber’...
متن کاملDictionary of Abstract and Concrete Words of the Russian Language: A Methodology for Creation and Application
The paper describes the first stage of a project on creating an electronic dictionary with numerical estimates of the degree of abstractness and concreteness of Russian words. Our approach is to integrate data obtained from several different sources: text corpora, psycholinguistic experiments, published dictionaries, markers of abstractness (certain suffixes) and a translation of a similar dict...
متن کاملSemantic Clustering of Russian Web Search Results: Possibilities and Problems
The present paper deals with word sense induction from lexical co-occurrence graphs. We construct such graphs on large Russian corpora and then apply the data to cluster the results of Mail.ru search according to meanings in the query. We compare different methods of performing such clustering and different source corpora. Models of applying distributional semantics to big linguistic data are d...
متن کاملConstructions in Parallel Corpora: A Quantitative Approach
The primary goal of the present study is to find an adequate method for the quantitative analysis of empirical data obtained from parallel corpora. Such a task is particularly important in the case of fixed constructions possessing some degree of idiomaticity and language specificity. Our data consist of the Russian construction дело в том, что and its parallels in English, German and Swedish. ...
متن کاملComparison of High-Frequency Nouns from the Perspective of Large Corpora
Since the last decade a number of corpora have become available, a large part of them have been compiled automatically on web data. From traditional text collections such corpora vary both in their volume and content. The paper focuses on the discussion on these corpora and deals with two of them: ruTenTen (18.3 bln tokens) and Araneum Russicum Maximum (13.7 bln tokens). The authors discuss lin...
متن کامل